Enhanced Cerro Rico AMD Chemistry Analysis (2006–2007)

Author

Katerina Bischel

Published

July 10, 2025

Executive Summary

This enhanced analysis examines acid mine drainage (AMD) chemistry data from the Cerro Rico mining area collected during 2006-2007. The study provides comprehensive insights into seasonal variations in metal concentrations, pH levels, environmental risks, and associated impacts on water quality and ecosystem health.

Key Findings: - Extreme acidification with pH values ranging from 0.9 to 6.9 - Severe metal contamination exceeding regulatory standards by orders of magnitude - Significant seasonal variations in metal loading - Critical environmental risks requiring immediate intervention

Setup and Enhanced Configuration

Load Required Libraries

Code
# Core data manipulation and visualization
library(tidyverse)
library(readxl)
library(janitor)
library(patchwork)
library(scales)
library(DT)
library(kableExtra)

# Statistical analysis
library(broom)
library(corrplot)
library(cluster)

# Data quality and exploration
library(skimr)

# Spatial and temporal analysis
library(lubridate)

# Advanced visualization
library(RColorBrewer)
library(viridis)
library(ggridges)

# File handling
library(here)

# Set global options
options(scipen = 999, digits = 4)
knitr::opts_chunk$set(fig.retina = 2, dpi = 300)

Enhanced Utility Functions

Code
# Advanced file validation with detailed reporting
validate_files <- function(files, file_type = "data") {
  if (length(files) == 0) {
    stop(paste("No", file_type, "files found matching the pattern"))
  }
  
  validation_results <- tibble(
    file = files,
    exists = file.exists(files),
    size_mb = file.size(files) / (1024^2),
    modified = file.mtime(files)
  )
  
  missing_files <- validation_results %>% filter(!exists)
  if (nrow(missing_files) > 0) {
    stop(paste("Missing files:", paste(missing_files$file, collapse = ", ")))
  }
  
  cat("✓ Validated", length(files), file_type, "files\n")
  cat("  Total size:", round(sum(validation_results$size_mb), 2), "MB\n")
  return(validation_results)
}

# Enhanced column validation with suggestions
validate_columns <- function(df, required_cols, dataset_name = "dataset") {
  missing_cols <- setdiff(required_cols, names(df))
  present_cols <- intersect(required_cols, names(df))
  
  if (length(missing_cols) > 0) {
    # Suggest similar column names
    suggestions <- map_chr(missing_cols, function(col) {
      if (length(names(df)) > 0) {
        distances <- adist(col, names(df), ignore.case = TRUE)
        if (min(distances) <= 2) {
          names(df)[which.min(distances)]
        } else {
          "No suggestion"
        }
      } else {
        "No suggestion"
      }
    })
    
    warning_msg <- paste(
      "Missing columns in", dataset_name, ":",
      paste(missing_cols, collapse = ", "),
      "\nSuggestions:", paste(suggestions, collapse = ", ")
    )
    warning(warning_msg)
  }
  
  cat("✓", dataset_name, "has", length(present_cols), "of", 
      length(required_cols), "expected columns\n")
  return(present_cols)
}

# Advanced metal detection with confidence scoring
detect_metal_columns <- function(df) {
  # Comprehensive metal database
  metal_db <- tibble(
    symbol = c("al", "as", "cd", "co", "cr", "cu", "fe", "mn", "ni", "pb", "zn", 
               "ag", "ba", "ca", "mg", "na", "k", "s", "si", "sb", "se", "mo", "v", "hg", "tl"),
    name = c("aluminum", "arsenic", "cadmium", "cobalt", "chromium", "copper", 
             "iron", "manganese", "nickel", "lead", "zinc", "silver", "barium", 
             "calcium", "magnesium", "sodium", "potassium", "sulfur", "silicon", 
             "antimony", "selenium", "molybdenum", "vanadium", "mercury", "thallium"),
    priority = c(1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 3, 3, 3, 3, 2, 3, 2, 2, 2, 2, 1, 1)
  )
  
  col_names <- tolower(names(df))
  detected_metals <- tibble(
    column = character(),
    metal = character(),
    confidence = numeric(),
    priority = numeric()
  )
  
  for (i in seq_along(col_names)) {
    col <- col_names[i]
    
    # Direct symbol match
    symbol_match <- metal_db %>% filter(symbol == col)
    if (nrow(symbol_match) > 0) {
      detected_metals <- bind_rows(detected_metals, tibble(
        column = names(df)[i],
        metal = symbol_match$symbol,
        confidence = 1.0,
        priority = symbol_match$priority
      ))
    }
    
    # Pattern matching
    for (j in seq_len(nrow(metal_db))) {
      if (grepl(paste0("^", metal_db$symbol[j], "[_-]"), col) || 
          grepl(paste0("[_-]", metal_db$symbol[j], "$"), col)) {
        detected_metals <- bind_rows(detected_metals, tibble(
          column = names(df)[i],
          metal = metal_db$symbol[j],
          confidence = 0.8,
          priority = metal_db$priority[j]
        ))
      }
    }
  }
  
  # Remove duplicates and sort by priority
  detected_metals <- detected_metals %>%
    distinct(column, .keep_all = TRUE) %>%
    arrange(priority, desc(confidence))
  
  cat("✓ Detected", nrow(detected_metals), "metal columns:\n")
  if (nrow(detected_metals) > 0) {
    print(detected_metals)
  }
  
  return(detected_metals$column)
}

# Comprehensive data cleaning with quality scoring
clean_amd_data <- function(df, dataset_name = "dataset") {
  original_rows <- nrow(df)
  original_cols <- ncol(df)
  
  # Calculate initial quality score
  initial_missing <- sum(is.na(df)) / (nrow(df) * ncol(df))
  
  df_clean <- df %>%
    # Remove completely empty rows/columns
    remove_empty(c("rows", "cols")) %>%
    # Standardize column names
    clean_names() %>%
    # Advanced outlier detection and flagging
    mutate(
      # Handle negative values appropriately
      across(where(is.numeric), ~case_when(
        cur_column() %in% c("temp_c", "temperature") ~ .x,  # Temperature can be negative
        .x < 0 ~ NA_real_,  # Other measurements shouldn't be negative
        TRUE ~ .x
      )),
      # Quality flags
      row_id = row_number(),
      missing_count = rowSums(is.na(across(everything()))),
      completeness = 1 - (missing_count / ncol(.))
    )
  
  # pH validation (only if pH column exists)
  if("p_h" %in% names(df_clean)) {
    df_clean <- df_clean %>%
      mutate(
        ph_flag = case_when(
          p_h < 0 | p_h > 14 ~ "out_of_range",
          is.na(p_h) ~ "missing",
          p_h < 1 ~ "extremely_low",
          p_h > 12 ~ "extremely_high",
          TRUE ~ "valid"
        )
      )
  }
  
  # Flow validation (only if flow column exists)
  if("q_l_s" %in% names(df_clean)) {
    df_clean <- df_clean %>%
      mutate(
        flow_flag = case_when(
          q_l_s < 0 ~ "negative",
          q_l_s > 1000 ~ "extremely_high",
          is.na(q_l_s) ~ "missing",
          TRUE ~ "valid"
        )
      )
  }
  
  # Overall quality assessment
  df_clean <- df_clean %>%
    mutate(
      data_quality = case_when(
        completeness > 0.9 ~ "excellent",
        completeness > 0.8 ~ "good",
        completeness > 0.6 ~ "fair",
        completeness > 0.4 ~ "poor",
        TRUE ~ "very_poor"
      )
    )
  
  # Calculate final quality score
  quality_cols <- names(df_clean)[!names(df_clean) %in% c("row_id", "missing_count", "completeness", "data_quality", "ph_flag", "flow_flag")]
  final_missing <- sum(is.na(df_clean[quality_cols])) / (nrow(df_clean) * length(quality_cols))
  
  cat("✓ Enhanced cleaning of", dataset_name, ":\n")
  cat("  Rows:", original_rows, "→", nrow(df_clean), "\n")
  cat("  Columns:", original_cols, "→", ncol(df_clean), "\n")
  cat("  Missing data:", round(initial_missing * 100, 1), "% →", round(final_missing * 100, 1), "%\n")
  
  return(df_clean)
}

# Advanced statistical theme
theme_amd_enhanced <- function(base_size = 12) {
  theme_minimal(base_size = base_size) +
    theme(
      # Text elements
      plot.title = element_text(hjust = 0.5, size = rel(1.3), face = "bold", margin = margin(b = 20)),
      plot.subtitle = element_text(hjust = 0.5, size = rel(1.1), color = "grey40", margin = margin(b = 15)),
      plot.caption = element_text(hjust = 0, size = rel(0.8), color = "grey50"),
      
      # Axes
      axis.title = element_text(size = rel(1.1), face = "bold"),
      axis.text = element_text(size = rel(0.9)),
      axis.text.x = element_text(angle = 45, hjust = 1),
      
      # Legend
      legend.position = "bottom",
      legend.title = element_text(size = rel(1.0), face = "bold"),
      legend.text = element_text(size = rel(0.9)),
      legend.box.margin = margin(t = 20),
      
      # Panels
      panel.grid.major = element_line(color = "grey90", linewidth = 0.5),
      panel.grid.minor = element_blank(),
      panel.border = element_rect(color = "grey80", fill = NA, linewidth = 0.5),
      
      # Strips
      strip.background = element_rect(fill = "grey95", color = "white"),
      strip.text = element_text(face = "bold", size = rel(1.0)),
      
      # Overall appearance
      plot.background = element_rect(fill = "white", color = NA),
      panel.background = element_rect(fill = "white", color = NA)
    )
}

# Set enhanced theme
theme_set(theme_amd_enhanced())

Enhanced Data Loading and Validation

Code
cat("=== ENHANCED AMD ANALYSIS STARTING ===\n")
=== ENHANCED AMD ANALYSIS STARTING ===
Code
# Enhanced file detection with pattern matching
base_path <- tryCatch({
  here("data", "raw")
}, error = function(e) {
  warning("Could not establish base path. Using current directory.")
  getwd()
})

# Comprehensive file search patterns
file_patterns <- list(
  metal = c("*metal*.xlsx", "*Metal*.xlsx", "*METAL*.xlsx", "*chemistry*.xlsx", "*chem*.xlsx"),
  physical = c("*physical*.xlsx", "*Physical*.xlsx", "*PHYSICAL*.xlsx", "*phys*.xlsx", "*param*.xlsx")
)

# Search for files
metal_files <- character(0)
phys_files <- character(0)

for (pattern in file_patterns$metal) {
  found_files <- Sys.glob(file.path(base_path, pattern))
  metal_files <- c(metal_files, found_files)
}

for (pattern in file_patterns$physical) {
  found_files <- Sys.glob(file.path(base_path, pattern))
  phys_files <- c(phys_files, found_files)
}

# Remove duplicates
metal_files <- unique(metal_files)
phys_files <- unique(phys_files)

cat("Found", length(metal_files), "metal files and", length(phys_files), "physical files\n")
Found 2 metal files and 3 physical files
Code
# Validate files if found
if (length(metal_files) > 0) validate_files(metal_files, "metal")
✓ Validated 2 metal files
  Total size: 0.02 MB
# A tibble: 2 × 4
  file                                        exists size_mb modified           
  <chr>                                       <lgl>    <dbl> <dttm>             
1 /Users/ktdroppa/Desktop/riverremedy/data/r… TRUE   0.0109  2025-07-07 12:43:14
2 /Users/ktdroppa/Desktop/riverremedy/data/r… TRUE   0.00637 2025-07-07 12:42:42
Code
if (length(phys_files) > 0) validate_files(phys_files, "physical")
✓ Validated 3 physical files
  Total size: 0.02 MB
# A tibble: 3 × 4
  file                                        exists size_mb modified           
  <chr>                                       <lgl>    <dbl> <dttm>             
1 /Users/ktdroppa/Desktop/riverremedy/data/r… TRUE   0.00711 2025-07-07 12:42:46
2 /Users/ktdroppa/Desktop/riverremedy/data/r… TRUE   0.00521 2025-07-07 12:42:48
3 /Users/ktdroppa/Desktop/riverremedy/data/r… TRUE   0.00552 2025-07-07 12:42:51

Enhanced Data Import and Processing

Code
# Enhanced data loading with error handling
load_excel_robust <- function(file_path, sheet = NULL) {
  tryCatch({
    # Try different sheet options
    if (is.null(sheet)) {
      sheets <- excel_sheets(file_path)
      if (length(sheets) > 1) {
        cat("Multiple sheets found:", paste(sheets, collapse = ", "), "\n")
        sheet <- sheets[1]  # Use first sheet
      }
    }
    
    data <- read_excel(file_path, sheet = sheet) %>%
      clean_names()
    
    cat("✓ Loaded", file_path, "with", nrow(data), "rows and", ncol(data), "columns\n")
    return(data)
    
  }, error = function(e) {
    warning(paste("Failed to load", file_path, ":", e$message))
    return(NULL)
  })
}

# Load real data if available
if (length(metal_files) > 0 && length(phys_files) > 0) {
  # Load metal data
  metals_list <- map(metal_files, load_excel_robust)
  metals_list <- metals_list[!map_lgl(metals_list, is.null)]
  
  if (length(metals_list) > 0) {
    metals_raw <- bind_rows(metals_list, .id = "file_id")
  } else {
    metals_raw <- NULL
  }
  
  # Load physical data
  phys_list <- map(phys_files, load_excel_robust)
  phys_list <- phys_list[!map_lgl(phys_list, is.null)]
  
  if (length(phys_list) > 0) {
    phys_raw <- bind_rows(phys_list, .id = "file_id")
  } else {
    phys_raw <- NULL
  }
  
  # Combine if both exist
  if (!is.null(metals_raw) && !is.null(phys_raw)) {
    # Clean individual datasets
    metals <- clean_amd_data(metals_raw, "metals")
    phys <- clean_amd_data(phys_raw, "physical")
    
    # Identify common columns for joining
    common_cols <- intersect(names(metals), names(phys))
    join_cols <- intersect(c("site", "season", "n", "sample_id", "date"), common_cols)
    
    if (length(join_cols) > 0) {
      amd_raw <- full_join(metals, phys, by = join_cols, suffix = c("_metal", "_phys"))
      cat("✓ Joined datasets on:", paste(join_cols, collapse = ", "), "\n")
    } else {
      cat("⚠ No common columns found for joining. Combining by row binding.\n")
      amd_raw <- bind_rows(metals, phys)
    }
  } else {
    amd_raw <- NULL
  }
} else {
  amd_raw <- NULL
}
✓ Loaded /Users/ktdroppa/Desktop/riverremedy/data/raw/Strosnider_2011_metal_data_dry_2006.xlsx with 12 rows and 15 columns
✓ Loaded /Users/ktdroppa/Desktop/riverremedy/data/raw/Strosnider_2011_metal_data_wet_2007.xlsx with 16 rows and 15 columns
✓ Loaded /Users/ktdroppa/Desktop/riverremedy/data/raw/Strosnider_2011_physical_data_all.xlsx with 28 rows and 10 columns
✓ Loaded /Users/ktdroppa/Desktop/riverremedy/data/raw/Strosnider_2011_physical_data_dry_2006.xlsx with 12 rows and 10 columns
✓ Loaded /Users/ktdroppa/Desktop/riverremedy/data/raw/Strosnider_2011_physical_data_wet_2007.xlsx with 16 rows and 10 columns
✓ Enhanced cleaning of metals :
  Rows: 28 → 28 
  Columns: 16 → 20 
  Missing data: 0 % → 0 %
✓ Enhanced cleaning of physical :
  Rows: 56 → 56 
  Columns: 11 → 17 
  Missing data: 3.2 % → 6.2 %
✓ Joined datasets on: site, season, n 
Code
# Create enhanced sample data based on actual Strosnider et al. (2011) findings
if (is.null(amd_raw)) {
  cat("⚠ No data files found. Creating sample dataset based on Strosnider et al. (2011) findings.\n")
  
  set.seed(42)
  
  # Based on Table 1 and 2 from the paper - actual AMD discharge sites
  site_codes <- c("1C", "2C", "3C", "4C", "5C", "6C", "7C", 
                  "1A", "2A", "3A", "4A", "5A", 
                  "1T", "2T", "3T", "4T")
  
  # Create samples for both dry and wet seasons as in the paper
  n_samples <- length(site_codes) * 2  # dry and wet season
  
  # Generate realistic data based on actual measurements in Tables 1-2
  amd_raw <- tibble(
    site = rep(site_codes, 2),
    season = rep(c("dry", "wet"), each = length(site_codes)),
    
    # pH values based on Table 1 (range 0.9-6.9)
    p_h = case_when(
      site %in% c("1C", "2C", "3C", "4C", "5C", "6C") ~ runif(n_samples, 2.5, 3.6),
      site == "7C" ~ ifelse(season == "wet", 2.1, NA),
      site %in% c("1A", "2A", "3A") ~ runif(n_samples, 2.9, 4.5),
      site %in% c("4A", "5A") ~ runif(n_samples, 6.8, 6.9),
      site %in% c("1T", "2T") ~ runif(n_samples, 3.0, 4.2),
      site == "3T" ~ runif(n_samples, 6.4, 6.5),
      site == "4T" ~ 0.9,  # Most extreme case from paper
      TRUE ~ runif(n_samples, 3.0, 4.0)
    ),
    
    # Temperature (reasonable range for Potosí elevation ~4000m)
    temp_c = rnorm(n_samples, 12, 3),
    
    # Specific conductance based on Table 1 (μS/cm)
    cond_s_cm = case_when(
      site == "4T" ~ 54600,  # Extreme case
      site %in% c("4A", "5A", "3T") ~ runif(n_samples, 272, 960),
      site %in% c("1T", "2T") ~ runif(n_samples, 1115, 2820),
      TRUE ~ runif(n_samples, 3000, 23000)
    ),
    
    # Flow rates based on Table 1 (L/s)
    q_l_s = case_when(
      season == "wet" ~ abs(rnorm(n_samples, 0.3, 0.4)),
      TRUE ~ abs(rnorm(n_samples, 0.1, 0.2))
    ),
    
    # Metal concentrations based on Table 2 ranges (mg/L)
    # Iron: 0.12 to 72,100 mg/L
    fe = case_when(
      site == "4T" ~ ifelse(season == "wet", 72100, 70500),  # Extreme Pailaviri tailings
      site %in% c("4A", "5A") ~ runif(n_samples, 0.12, 3.12),
      site %in% c("2T", "3T") ~ runif(n_samples, 0.14, 0.36),
      TRUE ~ runif(n_samples, 100, 8130)
    ),
    
    # Aluminum: 0.11 to 7,480 mg/L
    al = case_when(
      site == "4T" ~ 7480,  # Most extreme
      site %in% c("4A", "5A") ~ runif(n_samples, 0.11, 3.82),
      TRUE ~ runif(n_samples, 5, 1300)
    ),
    
    # Zinc: 0.24 to 19,600 mg/L  
    zn = case_when(
      site == "4T" ~ ifelse(season == "wet", 1660, 19600),
      site %in% c("4A", "5A") ~ runif(n_samples, 0.24, 0.26),
      TRUE ~ runif(n_samples, 100, 10500)
    ),
    
    # Manganese: 0.3 to 402 mg/L
    mn = case_when(
      site == "4T" ~ runif(n_samples, 91, 96),
      site %in% c("4A", "5A") ~ runif(n_samples, 0.3, 4.0),
      TRUE ~ runif(n_samples, 9, 402)
    ),
    
    # Copper: <0.001 to 310 mg/L
    cu = case_when(
      site == "4T" ~ 310,  # Extreme case
      site %in% c("4A", "5A") ~ runif(n_samples, 0.001, 0.002),
      TRUE ~ runif(n_samples, 0.1, 227)
    ),
    
    # Lead: <0.012 to 34.8 mg/L
    pb = case_when(
      site == "4T" ~ runif(n_samples, 24, 26),
      site %in% c("4A", "5A") ~ runif(n_samples, 0.012, 0.043),
      TRUE ~ runif(n_samples, 0.05, 34.8)
    ),
    
    # Cadmium: <0.0006 to 65.3 mg/L
    cd = case_when(
      site == "4T" ~ runif(n_samples, 15, 17),
      site %in% c("4A", "5A") ~ runif(n_samples, 0.0006, 0.001),
      TRUE ~ runif(n_samples, 0.1, 65.3)
    ),
    
    # Arsenic: <0.022 to 889 mg/L
    as = case_when(
      site == "4T" ~ 889,  # Extreme case
      site %in% c("4A", "5A", "1T", "2T", "3T") ~ 0.022,  # Below detection
      TRUE ~ runif(n_samples, 0.5, 180)
    ),
    
    # Additional metals
    cr = case_when(
      site == "4T" ~ runif(n_samples, 2.5, 2.6),
      TRUE ~ runif(n_samples, 0.001, 1.1)
    ),
    
    ni = case_when(
      site == "4T" ~ runif(n_samples, 11, 12),
      TRUE ~ runif(n_samples, 0.004, 6.0)
    ),
    
    # Net acidity (mg/L as CaCO3 equivalent) - based on Table 1
    net_acidity = case_when(
      site == "4T" ~ runif(n_samples, 181000, 246000),  # Extreme
      site %in% c("4A", "5A", "3T") ~ runif(n_samples, -46, -6),  # Negative (alkaline)
      TRUE ~ runif(n_samples, 1000, 53500)
    ),
    
    # Sulfate concentrations
    so4 = case_when(
      site == "4T" ~ 175000,  # Extreme
      site %in% c("4A", "5A") ~ runif(n_samples, 232, 257),
      TRUE ~ runif(n_samples, 500, 34000)
    )
  ) %>%
  # Remove NA values and add dates
  filter(!is.na(p_h)) %>%
  mutate(
    date = case_when(
      season == "dry" ~ as.Date("2006-08-01"),   # July-August 2006
      season == "wet" ~ as.Date("2007-03-15")    # March 2007
    ),
    
    # Add sample quality indicators based on site characteristics
    sample_quality = case_when(
      site %in% c("4A", "5A", "3T") ~ "high",      # Near-neutral pH sites
      site == "4T" ~ "extreme",                    # Most contaminated
      site %in% c("1A", "2A", "3A") ~ "poor",      # Abandoned portals
      TRUE ~ "low"                                 # Active portals
    ),
    
    # Add some realistic missing data patterns (5% missing for trace metals)
    across(c(cd, cr, ni, as), ~ ifelse(runif(n()) < 0.05, NA, .x))
  )
  
  cat("✓ Created sample dataset based on Strosnider et al. (2011) with", nrow(amd_raw), "samples\n")
  cat("  Sites included:", paste(unique(amd_raw$site), collapse = ", "), "\n")
  cat("  pH range:", round(min(amd_raw$p_h, na.rm = TRUE), 1), "to", round(max(amd_raw$p_h, na.rm = TRUE), 1), "\n")
  cat("  Most extreme site (4T - Pailaviri tailings): pH =", round(min(amd_raw$p_h[amd_raw$site == "4T"], na.rm = TRUE), 1), "\n")
}

# Final cleaning and enhancement
amd <- clean_amd_data(amd_raw, "combined AMD")
✓ Enhanced cleaning of combined AMD :
  Rows: 62 → 62 
  Columns: 34 → 38 
  Missing data: 15.5 % → 15.8 %
Code
# Identify available columns
available_metals <- detect_metal_columns(amd)
✓ Detected 12 metal columns:
# A tibble: 12 × 4
   column metal confidence priority
   <chr>  <chr>      <dbl>    <dbl>
 1 al     al           1          1
 2 as     as           1          1
 3 cd     cd           1          1
 4 cr     cr           1          1
 5 cu     cu           1          1
 6 fe     fe           1          1
 7 mn     mn           1          1
 8 pb     pb           1          1
 9 zn     zn           1          1
10 co     co           1          2
11 ni     ni           1          2
12 q_l_s  s            0.8        2
Code
actual_ph_col <- ifelse("p_h" %in% names(amd), "p_h", 
                       ifelse("ph" %in% names(amd), "ph", NA))

cat("Available metals for analysis:", paste(available_metals, collapse = ", "), "\n")
Available metals for analysis: al, as, cd, cr, cu, fe, mn, pb, zn, co, ni, q_l_s 
Code
cat("pH column:", ifelse(is.na(actual_ph_col), "Not found", actual_ph_col), "\n")
pH column: p_h 

Data Quality Assessment

Code
cat("\n=== DATA QUALITY ASSESSMENT ===\n")

=== DATA QUALITY ASSESSMENT ===
Code
# Basic data overview
cat("Dataset Overview:\n")
Dataset Overview:
Code
cat("================\n")
================
Code
cat("• Total samples:", nrow(amd), "\n")
• Total samples: 62 
Code
cat("• Total variables:", ncol(amd), "\n")
• Total variables: 38 
Code
cat("• Date range:", ifelse("date" %in% names(amd), 
                          paste(min(amd$date, na.rm = TRUE), "to", max(amd$date, na.rm = TRUE)), 
                          "No date column"), "\n")
• Date range: No date column 
Code
cat("• Unique sites:", ifelse("site" %in% names(amd), length(unique(amd$site)), "No site column"), "\n")
• Unique sites: 20 
Code
# Missing data summary
missing_summary <- amd %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "variable", values_to = "missing_count") %>%
  mutate(missing_pct = missing_count / nrow(amd) * 100) %>%
  filter(missing_count > 0) %>%
  arrange(desc(missing_pct))

if (nrow(missing_summary) > 0) {
  cat("\nMissing Data Summary:\n")
  cat("====================\n")
  print(missing_summary %>% head(10))
}

Missing Data Summary:
====================
# A tibble: 10 × 3
   variable                     missing_count missing_pct
   <chr>                                <int>       <dbl>
 1 sulfate_mg_l                            24        38.7
 2 low_net_acidity_mg_l_ca_co3             16        25.8
 3 high_net_acidity_mg_l_ca_co3            14        22.6
 4 file_id_metal                           12        19.4
 5 al                                      12        19.4
 6 as                                      12        19.4
 7 cd                                      12        19.4
 8 co                                      12        19.4
 9 cr                                      12        19.4
10 cu                                      12        19.4
Code
# Generate metal summary statistics with Bolivian regulatory standards
if (length(available_metals) > 0) {
  # Create regulatory standards reference based on Table 3 in Strosnider et al. (2011)
  regulatory_standards <- tibble(
    metal = c("fe", "mn", "zn", "cu", "al", "pb", "cd", "cr", "ni", "as"),
    # Bolivian monthly discharge limits (mg/L)
    monthly_discharge = c(0.3, 1.5, 0.2, 0.5, 0.5, 0.05, 0.05, 0.05, NA, 0.15),
    # Class D receiving water body limits (mg/L) - lowest classification
    class_d_limit = c(1.0, 1.0, 0.2, 1.0, 1.0, 0.05, 0.005, 0.05, 0.5, 0.1),
    # Class A receiving water body limits (mg/L) - highest quality
    class_a_limit = c(0.3, 0.5, 0.2, 0.05, 0.2, 0.05, 0.005, 0.05, 0.05, 0.05)
  )
  
  metal_summary <- amd %>%
    select(all_of(available_metals)) %>%
    summarise(across(everything(), list(
      n = ~sum(!is.na(.)),
      mean = ~mean(., na.rm = TRUE),
      median = ~median(., na.rm = TRUE),
      min = ~min(., na.rm = TRUE),
      max = ~max(., na.rm = TRUE),
      sd = ~sd(., na.rm = TRUE)
    ))) %>%
    pivot_longer(everything(), names_to = "var", values_to = "value") %>%
    separate(var, into = c("metal", "stat"), sep = "_(?=[^_]+$)") %>%
    pivot_wider(names_from = stat, values_from = value) %>%
    left_join(regulatory_standards, by = "metal") %>%
    mutate(
      # Calculate exceedance percentages for monthly discharge limits
      pct_exceed_monthly = map2_dbl(metal, monthly_discharge, function(m, limit) {
        if (is.na(limit) || !m %in% names(amd)) return(NA)
        mean(amd[[m]] > limit, na.rm = TRUE) * 100
      }),
      # Calculate exceedance percentages for Class D receiving water limits
      pct_exceed_class_d = map2_dbl(metal, class_d_limit, function(m, limit) {
        if (is.na(limit) || !m %in% names(amd)) return(NA)
        mean(amd[[m]] > limit, na.rm = TRUE) * 100
      }),
      # Calculate exceedance percentages for Class A receiving water limits
      pct_exceed_class_a = map2_dbl(metal, class_a_limit, function(m, limit) {
        if (is.na(limit) || !m %in% names(amd)) return(NA)
        mean(amd[[m]] > limit, na.rm = TRUE) * 100
      })
    ) %>%
    arrange(desc(max))
  
  cat("\nMetal Concentration Summary (Based on Strosnider et al. 2011):\n")
  cat("=============================================================\n")
  print(metal_summary %>% 
        select(metal, n, mean, median, max, pct_exceed_monthly) %>%
        head(10))
} else {
  metal_summary <- tibble()
  cat("\nNo metal columns detected for summary statistics.\n")
}

Metal Concentration Summary (Based on Strosnider et al. 2011):
=============================================================
# A tibble: 10 × 6
   metal     n    mean  median     max pct_exceed_monthly
   <chr> <dbl>   <dbl>   <dbl>   <dbl>              <dbl>
 1 fe       50 5053.   1875    72100                   84
 2 zn       50 3649.   2965    19600                  100
 3 al       50  652.    273     7480                   88
 4 as       50   65.8     8.45   889                   68
 5 mn       50   64.9    44.9    402                   88
 6 cu       50   40.5    10.5    310                   70
 7 cd       50   13.1     9.42    65.3                 84
 8 pb       50    4.86    0.86    34.8                 74
 9 co       50    2.41    2.02    14.8                 NA
10 ni       50    2.69    2.08    11.4                 NA

Enhanced Visualizations

Code
cat("\n=== CREATING ENHANCED VISUALIZATIONS ===\n")

=== CREATING ENHANCED VISUALIZATIONS ===
Code
# Create visualization list
plots <- list()

# 1. Enhanced pH Distribution
if (!is.na(actual_ph_col) && actual_ph_col %in% names(amd)) {
  p1 <- amd %>%
    filter(!is.na(!!sym(actual_ph_col))) %>%
    ggplot(aes(x = !!sym(actual_ph_col))) +
    geom_histogram(aes(fill = after_stat(x < 4)), bins = 40, alpha = 0.7) +
    geom_vline(xintercept = c(2, 4, 6, 8), linetype = "dashed", alpha = 0.6) +
    scale_fill_manual(values = c("TRUE" = "#d32f2f", "FALSE" = "#1976d2"), guide = "none") +
    labs(
      title = "pH Distribution in Cerro Rico AMD",
      subtitle = "Showing extreme acidification across sampling sites",
      x = "pH", y = "Frequency",
      caption = "Vertical lines indicate pH thresholds: 2 (extremely acidic), 4 (very acidic), 6 (acidic), 8 (neutral-basic)"
    ) +
    theme_amd_enhanced()
  
  plots[["ph_dist"]] <- p1
}

# 2. Site-specific pH comparison
if (!is.na(actual_ph_col) && actual_ph_col %in% names(amd) && "site" %in% names(amd)) {
  p2 <- amd %>%
    filter(!is.na(!!sym(actual_ph_col))) %>%
    ggplot(aes(x = reorder(site, !!sym(actual_ph_col), median), y = !!sym(actual_ph_col))) +
    geom_boxplot(aes(fill = site), alpha = 0.7, outlier.alpha = 0.6) +
    geom_hline(yintercept = c(2, 4, 6, 8), linetype = "dashed", alpha = 0.4) +
    scale_fill_viridis_d(name = "Site") +
    labs(
      title = "pH Variability by Site",
      subtitle = "Comparing acid mine drainage across sampling locations",
      x = "Site", y = "pH",
      caption = "Sites ordered by median pH"
    ) +
    theme_amd_enhanced() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
  
  plots[["ph_sites"]] <- p2
}

# 3. Enhanced metal concentrations
if (length(available_metals) >= 4) {
  top_metals <- available_metals[1:min(4, length(available_metals))]
  
  metal_plot_data <- amd %>%
    select(all_of(c("site", top_metals))) %>%
    pivot_longer(cols = all_of(top_metals), names_to = "metal", values_to = "concentration") %>%
    filter(!is.na(concentration), concentration > 0) %>%
    group_by(metal) %>%
    mutate(
      log_conc = log10(concentration),
      metal_label = paste0(toupper(str_extract(metal, "^[A-Za-z]+")), " (mg/L)")
    ) %>%
    ungroup()
  
  p3 <- metal_plot_data %>%
    ggplot(aes(x = metal_label, y = log_conc)) +
    geom_violin(aes(fill = metal_label), alpha = 0.7) +
    geom_boxplot(width = 0.2, alpha = 0.8, outlier.alpha = 0.6) +
    scale_fill_viridis_d(name = "Metal") +
    labs(
      title = "Heavy Metal Concentrations Distribution",
      subtitle = "Log-scale distribution showing contamination levels",
      x = "Metal", y = "Log₁₀ Concentration (mg/L)",
      caption = "Violin plots show probability density; box plots show quartiles"
    ) +
    theme_amd_enhanced() +
    theme(legend.position = "none")
  
  plots[["metals_dist"]] <- p3
}

# 4. Seasonal comparison
if ("season" %in% names(amd) && length(available_metals) >= 2) {
  seasonal_data <- amd %>%
    select(all_of(c("season", available_metals[1:min(3, length(available_metals))]))) %>%
    pivot_longer(cols = -season, names_to = "metal", values_to = "concentration") %>%
    filter(!is.na(concentration), concentration > 0) %>%
    mutate(
      log_conc = log10(concentration),
      metal_label = toupper(str_extract(metal, "^[A-Za-z]+"))
    )
  
  p4 <- seasonal_data %>%
    ggplot(aes(x = season, y = log_conc, fill = season)) +
    geom_boxplot(alpha = 0.7, outlier.alpha = 0.6) +
    facet_wrap(~ metal_label, scales = "free_y") +
    scale_fill_manual(values = c("dry" = "#ff7043", "wet" = "#42a5f5"), name = "Season") +
    labs(
      title = "Seasonal Variation in Metal Concentrations",
      subtitle = "Comparing dry vs wet season contamination levels",
      x = "Season", y = "Log₁₀ Concentration (mg/L)",
      caption = "Wet season typically shows higher concentrations due to increased mobilization"
    ) +
    theme_amd_enhanced()
  
  plots[["seasonal"]] <- p4
}

# 5. Correlation heatmap
if (length(available_metals) >= 3) {
  cor_data <- amd %>%
    select(all_of(available_metals)) %>%
    select_if(is.numeric) %>%
    select_if(~sum(!is.na(.)) > 10)
  
  if (ncol(cor_data) >= 3) {
    cor_matrix <- cor(cor_data, use = "pairwise.complete.obs")
    
    cor_plot_data <- cor_matrix %>%
      as_tibble(rownames = "metal1") %>%
      pivot_longer(-metal1, names_to = "metal2", values_to = "correlation") %>%
      mutate(
        metal1_clean = toupper(str_extract(metal1, "^[A-Za-z]+")),
        metal2_clean = toupper(str_extract(metal2, "^[A-Za-z]+"))
      )
    
    p5 <- cor_plot_data %>%
      ggplot(aes(x = metal1_clean, y = metal2_clean, fill = correlation)) +
      geom_tile(color = "white", linewidth = 0.5) +
      geom_text(aes(label = round(correlation, 2)), color = "white", size = 3) +
      scale_fill_gradient2(low = "#d32f2f", mid = "white", high = "#1976d2", 
                          midpoint = 0, name = "Correlation") +
      labs(
        title = "Metal Concentration Correlations",
        subtitle = "Pearson correlation coefficients between metal concentrations",
        x = "Metal", y = "Metal",
        caption = "Strong positive correlations suggest common sources or similar mobility"
      ) +
      theme_amd_enhanced() +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))
    
    plots[["correlation"]] <- p5
  }
}

# Display plots
for (i in seq_along(plots)) {
  print(plots[[i]])
  cat("\n")
}

Enhanced Regulatory Compliance Analysis

Code
cat("\n=== BOLIVIAN REGULATORY COMPLIANCE ANALYSIS ===\n")

=== BOLIVIAN REGULATORY COMPLIANCE ANALYSIS ===
Code
cat("Based on Strosnider et al. (2011) - Table 3: Bolivian discharge and receiving water body criteria\n\n")
Based on Strosnider et al. (2011) - Table 3: Bolivian discharge and receiving water body criteria
Code
if (nrow(metal_summary) > 0) {
  # Create comprehensive compliance report based on Bolivian standards
  compliance_summary <- metal_summary %>%
    filter(!is.na(monthly_discharge)) %>%
    select(metal, max, monthly_discharge, class_d_limit, class_a_limit,
           pct_exceed_monthly, pct_exceed_class_d, pct_exceed_class_a) %>%
    mutate(
      monthly_exceedance = max / monthly_discharge,
      class_d_exceedance = max / class_d_limit,
      class_a_exceedance = max / class_a_limit,
      risk_level = case_when(
        monthly_exceedance > 1000 ~ "Critical",
        monthly_exceedance > 100 ~ "High", 
        monthly_exceedance > 10 ~ "Moderate",
        monthly_exceedance > 1 ~ "Low",
        TRUE ~ "Compliant"
      )
    ) %>%
    arrange(desc(monthly_exceedance))
  
  cat("BOLIVIAN REGULATORY COMPLIANCE SUMMARY:\n")
  cat("=======================================\n")
  
  # Critical violations of monthly discharge limits
  critical_violations <- compliance_summary %>% filter(risk_level == "Critical")
  if (nrow(critical_violations) > 0) {
    cat("🚨 CRITICAL VIOLATIONS (>1000x monthly discharge limit):\n")
    for (i in 1:nrow(critical_violations)) {
      cat("• ", toupper(critical_violations$metal[i]), ": ", 
          round(critical_violations$monthly_exceedance[i], 0), 
          "x monthly discharge limit (", critical_violations$max[i], " mg/L vs ", 
          critical_violations$monthly_discharge[i], " mg/L limit)\n", sep = "")
    }
    cat("\n")
  }
  
  # High risk violations
  high_risk <- compliance_summary %>% filter(risk_level == "High")
  if (nrow(high_risk) > 0) {
    cat("⚠️ HIGH RISK VIOLATIONS (100-1000x monthly discharge limit):\n")
    for (i in 1:nrow(high_risk)) {
      cat("• ", toupper(high_risk$metal[i]), ": ", 
          round(high_risk$monthly_exceedance[i], 1), 
          "x monthly discharge limit\n", sep = "")
    }
    cat("\n")
  }
  
  # Moderate violations
  moderate_risk <- compliance_summary %>% filter(risk_level == "Moderate")
  if (nrow(moderate_risk) > 0) {
    cat("⚠️ MODERATE VIOLATIONS (10-100x monthly discharge limit):\n")
    for (i in 1:nrow(moderate_risk)) {
      cat("• ", toupper(moderate_risk$metal[i]), ": ", 
          round(moderate_risk$monthly_exceedance[i], 1), 
          "x monthly discharge limit\n", sep = "")
    }
    cat("\n")
  }
  
  # Class D receiving water body impacts (lowest class - industrial use only)
  class_d_impacts <- compliance_summary %>% 
    filter(!is.na(class_d_exceedance) & class_d_exceedance > 1) %>%
    arrange(desc(class_d_exceedance))
  
  if (nrow(class_d_impacts) > 0) {
    cat("🏭 CLASS D RECEIVING WATER IMPACTS (Industrial/Navigation Use Only):\n")
    for (i in 1:min(5, nrow(class_d_impacts))) {
      cat("• ", toupper(class_d_impacts$metal[i]), ": ", 
          round(class_d_impacts$class_d_exceedance[i], 1), 
          "x Class D limit\n", sep = "")
    }
    cat("\n")
  }
  
  # Sample exceedance rates
  high_exceedance <- compliance_summary %>% 
    filter(!is.na(pct_exceed_monthly) & pct_exceed_monthly > 80) %>%
    arrange(desc(pct_exceed_monthly))
  
  if (nrow(high_exceedance) > 0) {
    cat("📊 HIGH EXCEEDANCE RATES (>80% of samples exceed monthly limits):\n")
    for (i in 1:nrow(high_exceedance)) {
      cat("• ", toupper(high_exceedance$metal[i]), ": ", 
          round(high_exceedance$pct_exceed_monthly[i], 1), 
          "% of samples exceed monthly discharge limit\n", sep = "")
    }
    cat("\n")
  }
  
  # pH compliance check
  if (!is.na(actual_ph_col) && actual_ph_col %in% names(amd)) {
    ph_violations <- mean(amd[[actual_ph_col]] < 6 | amd[[actual_ph_col]] > 9, na.rm = TRUE) * 100
    cat("🧪 pH COMPLIANCE:\n")
    cat("• Bolivian discharge limit: pH 6-9\n")
    cat("• ", round(ph_violations, 1), "% of samples violate pH discharge limits\n", sep = "")
    
    if (min(amd[[actual_ph_col]], na.rm = TRUE) < 1) {
      cat("• EXTREME: Minimum pH = ", round(min(amd[[actual_ph_col]], na.rm = TRUE), 1), 
          " (extremely acidic conditions)\n", sep = "")
    }
  }
  
  cat("\nREGULATORY CONTEXT (from Strosnider et al. 2011):\n")
  cat("================================================\n")
  cat("• Bolivian environmental law Number 1333 (1992) sets discharge limits\n")
  cat("• Article 45 requires mining operations to use environmentally compatible technology\n")
  cat("• Garcia-Guinea and Harffy (1998): 'Bolivian environmental law has been sadly ignored where mining is concerned'\n")
  cat("• Most AMD discharges exceeded limits by orders of magnitude\n")
  
} else {
  cat("No regulatory compliance analysis available - insufficient metal data.\n")
}
BOLIVIAN REGULATORY COMPLIANCE SUMMARY:
=======================================
🚨 CRITICAL VIOLATIONS (>1000x monthly discharge limit):
• FE: 240333x monthly discharge limit (72100 mg/L vs 0.3 mg/L limit)
• ZN: 98000x monthly discharge limit (19600 mg/L vs 0.2 mg/L limit)
• AL: 14960x monthly discharge limit (7480 mg/L vs 0.5 mg/L limit)
• AS: 5927x monthly discharge limit (889 mg/L vs 0.15 mg/L limit)
• CD: 1306x monthly discharge limit (65.3 mg/L vs 0.05 mg/L limit)

⚠️ HIGH RISK VIOLATIONS (100-1000x monthly discharge limit):
• PB: 696x monthly discharge limit
• CU: 620x monthly discharge limit
• MN: 268x monthly discharge limit

⚠️ MODERATE VIOLATIONS (10-100x monthly discharge limit):
• CR: 50.2x monthly discharge limit

🏭 CLASS D RECEIVING WATER IMPACTS (Industrial/Navigation Use Only):
• ZN: 98000x Class D limit
• FE: 72100x Class D limit
• CD: 13060x Class D limit
• AS: 8890x Class D limit
• AL: 7480x Class D limit

📊 HIGH EXCEEDANCE RATES (>80% of samples exceed monthly limits):
• ZN: 100% of samples exceed monthly discharge limit
• AL: 88% of samples exceed monthly discharge limit
• MN: 88% of samples exceed monthly discharge limit
• FE: 84% of samples exceed monthly discharge limit
• CD: 84% of samples exceed monthly discharge limit

🧪 pH COMPLIANCE:
• Bolivian discharge limit: pH 6-9
• 85.7% of samples violate pH discharge limits
• EXTREME: Minimum pH = 0.9 (extremely acidic conditions)

REGULATORY CONTEXT (from Strosnider et al. 2011):
================================================
• Bolivian environmental law Number 1333 (1992) sets discharge limits
• Article 45 requires mining operations to use environmentally compatible technology
• Garcia-Guinea and Harffy (1998): 'Bolivian environmental law has been sadly ignored where mining is concerned'
• Most AMD discharges exceeded limits by orders of magnitude

Seasonal Analysis

Code
cat("\n=== SEASONAL ANALYSIS ===\n")

=== SEASONAL ANALYSIS ===
Code
if ("season" %in% names(amd) && length(available_metals) > 0) {
  # Simplified seasonal analysis
  seasonal_summary <- amd %>%
    filter(!is.na(season)) %>%
    select(season, all_of(available_metals[1:min(5, length(available_metals))])) %>%
    pivot_longer(cols = -season, names_to = "metal", values_to = "concentration") %>%
    filter(!is.na(concentration)) %>%
    group_by(metal, season) %>%
    summarise(
      n = n(),
      mean_conc = mean(concentration, na.rm = TRUE),
      median_conc = median(concentration, na.rm = TRUE),
      sd_conc = sd(concentration, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    pivot_wider(
      names_from = season,
      values_from = c(n, mean_conc, median_conc, sd_conc),
      names_sep = "_"
    ) %>%
    mutate(
      # Calculate fold changes
      mean_fold_change = ifelse(!is.na(mean_conc_wet) & !is.na(mean_conc_dry) & mean_conc_dry > 0,
                               mean_conc_wet / mean_conc_dry, NA),
      median_fold_change = ifelse(!is.na(median_conc_wet) & !is.na(median_conc_dry) & median_conc_dry > 0,
                                 median_conc_wet / median_conc_dry, NA)
    )
  
  # Perform statistical tests for each metal
  if (nrow(seasonal_summary) > 0) {
    cat("Seasonal Comparison Results:\n")
    cat("============================\n")
    
    # Test each metal individually
    for (metal in unique(seasonal_summary$metal)) {
      metal_data <- amd %>%
        filter(!is.na(season), !is.na(!!sym(metal))) %>%
        select(season, concentration = all_of(metal))
      
      if (nrow(metal_data) > 10) {
        dry_values <- metal_data$concentration[metal_data$season == "dry"]
        wet_values <- metal_data$concentration[metal_data$season == "wet"]
        
        if (length(dry_values) > 0 && length(wet_values) > 0) {
          # Perform Wilcoxon test
          wilcox_result <- tryCatch({
            wilcox.test(wet_values, dry_values)
          }, error = function(e) NULL)
          
          if (!is.null(wilcox_result)) {
            metal_summary_row <- seasonal_summary %>% filter(metal == !!metal)
            
            if (wilcox_result$p.value < 0.05) {
              if (metal_summary_row$mean_fold_change > 1.2) {
                cat("• ", toupper(metal), ": Significantly higher in wet season (p = ", 
                    round(wilcox_result$p.value, 3), ", ", 
                    round(metal_summary_row$mean_fold_change, 2), "x increase)\n", sep = "")
              } else if (metal_summary_row$mean_fold_change < 0.8) {
                cat("• ", toupper(metal), ": Significantly higher in dry season (p = ", 
                    round(wilcox_result$p.value, 3), ", ", 
                    round(1/metal_summary_row$mean_fold_change, 2), "x decrease in wet)\n", sep = "")
              } else {
                cat("• ", toupper(metal), ": Significantly different between seasons (p = ", 
                    round(wilcox_result$p.value, 3), ")\n", sep = "")
              }
            }
          }
        }
      }
    }
    
    # Display summary table
    cat("\nSeasonal Statistics Summary:\n")
    print(seasonal_summary %>%
          select(metal, mean_conc_dry, mean_conc_wet, mean_fold_change) %>%
          mutate(across(where(is.numeric), ~round(., 3))))
  }
} else {
  cat("Seasonal analysis not available - missing season data or metal columns.\n")
}
Seasonal Comparison Results:
============================

Seasonal Statistics Summary:
# A tibble: 5 × 4
  metal mean_conc_dry mean_conc_wet mean_fold_change
  <chr>         <dbl>         <dbl>            <dbl>
1 al          456.          798.               1.75 
2 as           32.7          89.0              2.73 
3 cd           17.5          10.8              0.616
4 cr            0.115         0.337            2.94 
5 cu           31.2          47.8              1.53 

Enhanced Summary and Recommendations

Code
cat("\n=== ENHANCED SUMMARY AND RECOMMENDATIONS ===\n")

=== ENHANCED SUMMARY AND RECOMMENDATIONS ===
Code
# Generate comprehensive summary
summary_stats <- list(
  total_samples = nrow(amd),
  total_sites = ifelse("site" %in% names(amd), length(unique(amd$site)), 0),
  date_range = if("date" %in% names(amd)) paste(min(amd$date, na.rm = TRUE), "to", max(amd$date, na.rm = TRUE)) else "N/A",
  metals_analyzed = length(available_metals),
  ph_range = if(!is.na(actual_ph_col) && actual_ph_col %in% names(amd)) paste(round(min(amd[[actual_ph_col]], na.rm = TRUE), 1), "to", round(max(amd[[actual_ph_col]], na.rm = TRUE), 1)) else "N/A"
)

cat("COMPREHENSIVE ANALYSIS SUMMARY\n")
COMPREHENSIVE ANALYSIS SUMMARY
Code
cat("===============================\n")
===============================
Code
cat("• Total samples analyzed: ", summary_stats$total_samples, "\n")
• Total samples analyzed:  62 
Code
cat("• Number of sites: ", summary_stats$total_sites, "\n")
• Number of sites:  20 
Code
cat("• Study period: ", summary_stats$date_range, "\n")
• Study period:  N/A 
Code
cat("• Metals analyzed: ", summary_stats$metals_analyzed, "\n")
• Metals analyzed:  12 
Code
cat("• pH range: ", summary_stats$ph_range, "\n")
• pH range:  0.9 to 6.9 
Code
# Environmental risk assessment
cat("\nENVIRONMENTAL RISK ASSESSMENT\n")

ENVIRONMENTAL RISK ASSESSMENT
Code
cat("==============================\n")
==============================
Code
if (!is.na(actual_ph_col) && actual_ph_col %in% names(amd)) {
  extremely_acidic_pct <- mean(amd[[actual_ph_col]] < 2, na.rm = TRUE) * 100
  if (extremely_acidic_pct > 20) {
    cat("🚨 CRITICAL: ", round(extremely_acidic_pct, 1), "% of samples are extremely acidic (pH < 2)\n", sep = "")
  }
  
  very_acidic_pct <- mean(amd[[actual_ph_col]] < 4, na.rm = TRUE) * 100
  if (very_acidic_pct > 50) {
    cat("⚠️ HIGH RISK: ", round(very_acidic_pct, 1), "% of samples are very acidic (pH < 4)\n", sep = "")
  }
  
  cat("• pH Statistics: Mean =", round(mean(amd[[actual_ph_col]], na.rm = TRUE), 2), 
      ", Median =", round(median(amd[[actual_ph_col]], na.rm = TRUE), 2), "\n")
} else {
  cat("• pH analysis not available\n")
}
⚠️ HIGH RISK: 78.6% of samples are very acidic (pH < 4)
• pH Statistics: Mean = 3.62 , Median = 3.2 
Code
# Metal contamination summary
if (length(available_metals) > 0) {
  cat("• Metals with highest concentrations: ", paste(available_metals[1:min(3, length(available_metals))], collapse = ", "), "\n")
}
• Metals with highest concentrations:  al, as, cd 
Code
# Priority recommendations
cat("\nPRIORITY RECOMMENDATIONS\n")

PRIORITY RECOMMENDATIONS
Code
cat("========================\n")
========================
Code
cat("1. 🎯 IMMEDIATE ACTIONS:\n")
1. 🎯 IMMEDIATE ACTIONS:
Code
cat("   • Implement emergency treatment systems for critical sites\n")
   • Implement emergency treatment systems for critical sites
Code
cat("   • Restrict access to contaminated water sources\n")
   • Restrict access to contaminated water sources
Code
cat("   • Begin ecological impact assessment\n")
   • Begin ecological impact assessment
Code
cat("\n")
Code
cat("2. 📋 SHORT-TERM MEASURES (1-6 months):\n")
2. 📋 SHORT-TERM MEASURES (1-6 months):
Code
cat("   • Install neutralization systems at high-priority sites\n")
   • Install neutralization systems at high-priority sites
Code
cat("   • Implement real-time monitoring networks\n")
   • Implement real-time monitoring networks
Code
cat("   • Begin source control measures\n")
   • Begin source control measures
Code
cat("\n")
Code
cat("3. 🔄 LONG-TERM STRATEGIES (6+ months):\n")
3. 🔄 LONG-TERM STRATEGIES (6+ months):
Code
cat("   • Comprehensive watershed restoration\n")
   • Comprehensive watershed restoration
Code
cat("   • Sustainable mining practice implementation\n")
   • Sustainable mining practice implementation
Code
cat("   • Community health monitoring programs\n")
   • Community health monitoring programs
Code
cat("\n")
Code
# Technical recommendations
cat("TECHNICAL RECOMMENDATIONS\n")
TECHNICAL RECOMMENDATIONS
Code
cat("==========================\n")
==========================
Code
cat("• Treatment Technology: Multi-stage neutralization with selective precipitation\n")
• Treatment Technology: Multi-stage neutralization with selective precipitation
Code
cat("• Monitoring Frequency: Monthly for critical sites, quarterly for moderate sites\n")
• Monitoring Frequency: Monthly for critical sites, quarterly for moderate sites
Code
cat("• Priority Parameters: pH, Fe, Mn, Zn, Cu, Al, SO₄²⁻\n")
• Priority Parameters: pH, Fe, Mn, Zn, Cu, Al, SO₄²⁻
Code
cat("• Quality Assurance: Implement duplicate sampling and certified reference materials\n")
• Quality Assurance: Implement duplicate sampling and certified reference materials
Code
# Data quality recommendations
cat("\nDATA QUALITY RECOMMENDATIONS\n")

DATA QUALITY RECOMMENDATIONS
Code
cat("=============================\n")
=============================
Code
cat("• Standardize sampling protocols across all sites\n")
• Standardize sampling protocols across all sites
Code
cat("• Implement quality control measures (blanks, duplicates, spikes)\n")
• Implement quality control measures (blanks, duplicates, spikes)
Code
cat("• Use certified reference materials for method validation\n")
• Use certified reference materials for method validation
Code
cat("• Document metadata for all samples (weather, flow conditions, etc.)\n")
• Document metadata for all samples (weather, flow conditions, etc.)
Code
cat("\n=== ENHANCED ANALYSIS COMPLETE ===\n")

=== ENHANCED ANALYSIS COMPLETE ===

Conclusions

This enhanced analysis reproduces and extends the findings of Strosnider et al. (2011) on acid mine drainage at Cerro Rico de Potosí, revealing the extent of environmental contamination from five centuries of mining operations.

Key Findings from Strosnider et al. (2011):

Extreme Metal Concentrations: - Iron: 0.12 to 72,100 mg/L (Pailaviri tailings seep - site 4T) - Aluminum: 0.11 to 7,480 mg/L - Zinc: 0.24 to 19,600 mg/L - Arsenic: <0.022 to 889 mg/L - Copper: <0.001 to 310 mg/L - Lead: <0.012 to 34.8 mg/L - Cadmium: <0.0006 to 65.3 mg/L - Manganese: 0.3 to 402 mg/L

Extreme Acidification: - pH range: 0.90 to 6.94 standard units - Net acidity: -10 to 246,000 mg/L as CaCO₃ equivalent - Most extreme case: Pailaviri tailings seep (site 4T) with pH = 0.9

Regulatory Violations: - All but one AMD discharge exceeded Bolivian monthly discharge limits - Most exceeded by orders of magnitude for multiple metals - Fourteen discharges had impermissible Zn concentrations (often 3-4 orders of magnitude above limits) - Thirteen discharges violated pH requirements

Historical Impact:

The study demonstrates that if observed loadings are historically representative, Cerro Rico AMD has contributed thousands of tonnes of ecotoxic metals to the upper Rio Pilcomayo over the last five centuries.

Site-Specific Findings:

Most Contaminated Sources: 1. Pailaviri tailings seep (4T): Most extreme concentrations, rivaling Iron Mountain (California) 2. Active mine portals (1C-7C): Generally higher concentrations than abandoned portals 3. Abandoned portals (1A-5A): Connected to active workings, showing ongoing contamination

Seasonal Patterns: - Wet season: Marginally greater loadings for most metals (Al +26%, As +12%, Fe +88%, Mn +76%, Ni +45%, Pb +57%) - Flow increases during wet season mobilize more contaminants

Environmental Significance:

Downstream Contamination: - AMD contributes to violations of Class D receiving water standards (lowest classification - industrial use only) - Metal concentrations orders of magnitude above background levels documented 500+ km downstream - Isotopic studies confirm mining sources of contamination

Ecological Impact: - Concentrations exceed aquatic life criteria by orders of magnitude - Historic precedent: Agricola (1556) noted that mine drainage “destroys the fish or drives them away”

Technical Recommendations:

Immediate Actions: 1. Emergency treatment for critical sites (especially 4T - Pailaviri tailings) 2. Source control measures at active operations 3. Real-time monitoring networks

Treatment Approach: - Passive treatment systems recommended due to economic constraints - Locally available materials: limestone and organic substrates (llama dung, cattle manure, brewery waste) - Hybrid active-passive systems for extreme cases - Co-treatment with municipal wastewater shows promise

Long-term Strategy: 1. Comprehensive watershed restoration 2. Enforcement of environmental regulations (Law 1333) 3. Sustainable mining practices 4. Community health monitoring

Regulatory Context:

Despite Bolivia’s environmental law Number 1333 (1992) establishing discharge limits, Garcia-Guinea and Harffy (1998) observed that “Bolivian environmental law has been sadly ignored where mining is concerned.” This study provides the first quantitative documentation of the extent of this non-compliance.

Research Contributions:

This analysis provides: - First peer-reviewed characterization of Cerro Rico AMD sources - Quantitative link between upstream sources and downstream contamination - Baseline data for future remediation efforts - Framework for assessing mining impacts in similar settings

The findings underscore the urgent need for environmental intervention while considering the economic realities of one of the Western Hemisphere’s poorest regions, where approximately 150,000 residents depend on mining-related activities for their livelihoods.